Buffon's Needle¶

I came across Buffon's Needle in the book Proofs from the Book by Martin Aigner and Günter M. Ziegler.

The following problem was posed in 1777 by the Comte de Buffon:

Suppose you drop a short needle on ruled paper - what is the probability that the needle comes to lie in a position where it crosses one of the lines?

The answer - surprisingly - involves $\large{\pi}$. And consequently provides a method to approximate the value of $\large{\pi}$ through Monte Carlo simulation.

Drawing

A short needle means that the length of the needle $(\ell)$ must be less than the distance between the lines ($d$) - otherwise a needle can cross two lines and this changes the result to something much less clean. In this short needle case, the probability that a needle comes to rest crossing one of the lines is

$$p = \dfrac{2 \ell}{d \pi} \; .$$

Given this result, an approximate value for $\pi$ would be

$$\hat \pi \approx \dfrac{2\ell}{d} \dfrac{N}{n}$$

where $N$ is the number of trials and $n$ is the number of "hits" i.e. intersections.

A proof by calculus¶

Getting over the line¶

Let's look at a single needle.

Each needle will have it's base (lower end) between two lines at distance $y$ from the lower line. (We can ignore the case where the base of the needle lies exactly on a line as this has probability zero.) And the needle will lie at angle $\alpha$ from the horizontal. The vertical height of the needle is therefore $\ell \sin \alpha$.

Drawing

So for the needle to intersect the top line requires: $\; y + \ell \sin \alpha >= d$.

Distribution of $y$ and $\alpha$¶

$y$ is distributed uniformly from $0$ to $d$ i.e. $y \sim U(0,d)$. And $\alpha$ is distributed uniformly from $0$ to $\pi$ i.e. $\alpha \sim U(0, \pi)$. The probability density value inside those ranges is just the value that makes the area under the probability density function equal to one. E.g. $1/d \times d = 1$.

What's more, these two distributions are independant. So every point in the $2D$ space within those ranges of $y$ and $\alpha$ are equally likely. And so the joint distribution is also uniform and can be pictured as a box (or rectangular cuboid).

$\hspace{3em}$	$$y \sim U(0, d)$$ , $$\int_0^d \frac{1}{d} \; dy = 1 $$	$\hspace{5em}$

	$$\alpha \sim U(0, \pi)$$ , $$ \int_0^\pi \frac{1}{\pi} \; d\alpha = 1$$

	$$y,\alpha \sim U \big( (0, d),(0, \pi) \big)$$ , $$\int_0^\pi \int_0^d \frac{1}{d \pi} \; dy \; d \alpha = 1$$

Hit or Miss¶

The probability we want is the volume of the rectangular cuboid that contains the "hits" i.e. intersections. And recall that a "hit" required that $y + \ell \sin \alpha >= d$ or $y >= d - \ell \sin \alpha$.

Drawing

So instead of integrating along $y$ from $0$ to $d$ we integrate from $d - \ell \sin \alpha$ to $d$.

i.e.

\begin{aligned} P(hit) &= \int_0^\pi \int_{d - \ell \sin \alpha}^d \frac{1}{d \pi} \,dy \,d\alpha \hspace{30em} \\[1em] &= \frac{1}{d \pi} \int_0^\pi \Big[y \Big]_{d - \ell \sin \alpha}^d \,d\alpha \\[1em] &= \frac{1}{d \pi} \int_0^\pi \ell \sin \alpha \,d\alpha \\[1em] &= \frac{\ell}{d \pi} \Big[- \cos \alpha \Big]_0^\pi \\[1em] &= \frac{2}{\pi} \frac{\ell}{d} \end{aligned}

Simulation¶

Now let's test the theorem with a simulation using Julia.

The function rand() uniformly picks a number from the range $[0, 1)$. So rand() * d uniformly picks a number from the range $[0, d)$. We could also have used the Distributions package and call rand(Uniform(0, d)).

Note that

d - rand() * d

is equivalent (just horizontally flipped which is okay as the Uniform distribution is symmetric) to

rand() * d

So the inner test of

rand() * d + ℓ * sin(rand() * π >= d

simplifies to

rand() * d <= ℓ * sin(rand() * π)

or

rand() <= ℓ / d * sin(rand() * π)

function π_est(N, ℓ, d)
    n = 0
    r = ℓ / d
    for i in 1:N
        rand() <= r * sin(rand() * π) && (n += 1)
    end
    return 2r * N/n
end

π_est (generic function with 1 method)

Note that each trial is done individually inside the for loop. It would have been an option to generate a vector of random numbers by calling, say for 1,000 trials, rand(1000). Some languages would recommend this technique but Julia is so quick at running simulations that this isn't necessary and, what's more, we can run many more simulations as we don't need memory - e.g. running rand(10_000_000) (with 64-bit floating-point numbers) would require 80MB of memory but running the simulation in a loop uses (as you can see below) only 176 bytes of memory.

Let's run this 10 million times! - and then compare to the true $\large{\pi}$. We can estimate with any values of $\ell$ and $d$. Let's try $\ell = 1$ and $d = 2$ (recall $\ell < d$).

@time π_hat = π_est(10_000_000, 1, 2)

  0.895147 seconds (5 allocations: 176 bytes)

3.143764343424817

# π is a built in constant in Julia 
π

π = 3.1415926535897...

println("absolute difference   : $(round(π_hat - π, 5))")
println("percentage difference : $(round((π_hat - π) / π * 100, 3))%")

absolute difference   : 0.00217
percentage difference : 0.069%

A pretty good approximation - which you'd hope for after that crazy number of needles.

Sharpening the Needle¶

The approximate value for $\pi$ from above is $\hat \pi \approx \dfrac{2\ell}{d} \dfrac{N}{n}$.

Calculating the variance of the estimator $\hat \pi$ is not straightforward as it involves the reciprocal of the Binomial distribution. But following the method in article "Sharpening Buffon's Needle" by Perlman and Wichura we can estimate the variance.

Start with the variance of the estimator $\dfrac{1}{\hat \pi} \approx \dfrac{d}{2 \ell N} n$ as follows:

\begin{aligned} n &\sim \mathrm{Binomial}(N, p) \hspace{36em} \\[1em] \mathrm{Var}(n) &= Np(1-p) \\[1em] \mathrm{Var}(\frac{1}{\hat \pi}) &= \frac{d^2}{4 \ell^2 N^2} Np(1-p) = \frac{d^2}{\ell^2} \frac{1}{4N} p(1-p) \end{aligned}

Then, using the delta method (see the wikipedia page and this page on Cross Validated), we can approximate the asymptotic variance of the estimator $\hat \pi$ as follows:

\begin{aligned} \mathrm{Var}(\hat \pi) &= \mathrm{Var}\left( \frac{1}{\hat \pi} \right) \; \mathrm{E}({\hat \pi})^4 \hspace{35.2em} \\[1em] &= \frac{d^2}{\ell^2} \frac{1}{4N} p(1-p) \pi^4 \\[1em] &= \frac{d^2}{\ell^2} \frac{1}{4N} \frac{4 \ell^2}{d^2 \pi^2} \left( \frac{d \pi}{2 \ell} - 1 \right) \pi^4 \hspace{5em} \text{using} \hspace{1em} p(1-p) = p^2 \left( \frac{1}{p} - 1 \right) = \frac{4 \ell^2}{d^2 \pi^2} \left( \frac{d \pi}{2 \ell} - 1 \right) \\[1em] &= \frac{\pi^2}{N} \left( \frac{d}{\ell} \frac{\pi}{2} - 1 \right) \end{aligned}

Note that to minimise the variance (of both $\hat \pi$ and $\dfrac{1}{\hat \pi})$ requires minimising $\dfrac{d}{\ell}$ i.e. setting $d = \ell$ as $d$ cannot be shorter than $\ell$.

Let's create convenience functions of the estimated variance and standard deviation, and then test with simulation. For the minimum variance case of $d = \ell$ we'd expect the variance to be $\dfrac{\pi^2}{N} \left( \dfrac{\pi}{2} - 1 \right) \approx \dfrac{5.63}{N}$.

σ²(N, ℓ, d) = π^2/N * (d/ℓ * π/2 - 1)
σ(N, ℓ, d) = √σ²(N, d, ℓ);

For $N=10,000$ (i.e. $10,000$ needles) and $d = \ell$ the variance estimate is:

N, ℓ, d = 10_000, 1, 1
σ²(N, ℓ, d)

0.0005633533939060551

Running this simulation $5,000$ times and calculating the sample variance gives:

var([π_est(10_000, 1, 1) for _=1:5000])

0.0005729194170991168

- a pretty good match.

We can now say more about the expected accuracy of the estimate of $\pi$. We can expect to have an error of not more than two standard deviations away from the mean about $95\%$ of the time.

For $d = \ell$ and $N = 10,000$ this is:

2σ(10_000, 1, 1)

0.04747013351175895

For $d = \ell$ and $N = 10,000,000$ this is:

2σ(10_000_000, 1, 1)

0.0015011374272944566

Now let's run the simulation again with this optimal setting of $d=\ell$.

π_hat = π_est(10_000_000, 1, 1)
println("π estimate            : $(round(π_hat, 6))")
println("π actual              : $(round(π, 6))")
println("absolute difference   : $(round(π_hat - π, 6))")

π estimate            : 3.141769
π actual              : 3.141593
absolute difference   : 0.000177

we find the estimate is inside this error interval.

Turning this error estimate around we can see how slow the convergence is for a given confidence interval. If we want a $95 \%$ confidence that the error is not more than $\epsilon$ we should throw $\left( 2 \dfrac{\sqrt 5.63}{\epsilon} \right)^2$ needles. So if we want the error margin to be to 3 digits (i.e. $\epsilon = 0,001$) we'd need to drop over $22,000,000$ needles!

Now returning to the choice of $d$ (i.e. how far apart to draw the horizontal lines - given needle length $\ell$). As shown above, to minimise the variance we want $d = \ell$. We can demonstrate this with simulations and charts. We'll run $1,000$ trials of $10,000$ needles each for four parameter settings of $d$.

Each point in the chart below is an estimate of $\pi$ and the red lines show the $95%$ confidence interval estimated as above. As you can see, the case $d = \ell$ does indeed have the lowest variance.

using Gadfly    # Plotting package

function plot_trials(;ℓ=1, d=1)
    estimates = zeros(1000)
    for k in 1:1000
        estimates[k] = π_est(10_000, ℓ, d)
    end
    l1 = layer(yintercept=[π - 2σ(10_000, d, ℓ), π + 2σ(10_000, d, ℓ)], Geom.hline(color="red"))
    l2 = layer(x=1:1000, y=estimates, Geom.point)
    guides = (Coord.cartesian(ymin=2.8, ymax=3.5),
              Guide.xticks(ticks=collect(200:200:1000)), Guide.yticks(ticks=[2.8, 2.9, 3.0, 3.1, 3.14, 3.2, 3.3, 3.4, 3.5]),
              Guide.xlabel("trial"), Guide.ylabel("estimate"), Guide.title("ℓ = $ℓ, d = $d"))
    plot(l1, l2, guides...)
end;

set_default_plot_size(24cm, 16cm)
vstack(hstack(plot_trials(ℓ=1, d=1), plot_trials(ℓ=1, d=2)), hstack(plot_trials(ℓ=1, d=4), plot_trials(ℓ=1, d=8)))